application to case study
sanity checks
remove and retrain
Observation r Citep(bib, "sanity_checks"): Many saliency methods are invariant to model parameters and data labels.
Visual appeal \(\neq\) faithfulness.
Guided Backprop ≈ edge detector (no model/data dependence).
Highlights boundaries via non-linear image processing, not learned features.
Results: Gradients/IG pass. Guided Backprop fails.
<img src=’figures/cascading_randomization.png” class=“center”/>
Train on \(\{(x_i, \tilde{y}_i)\}\) where \(\tilde{y}_i\) are random permutations.
Model trained on random labels should produce noisy saliency maps.
Many methods still highlight “objects” (spurious structure).
Can we evaluate feature importance without relying on human visual bias?
Masking creates distribution shift. Model sees OOD inputs (black squares).
Performance drop reflects: - Information loss (what we want), OR - Mask artifact (confound)
Solution: Retrain on masked data to isolate information loss.
Remove top \(t\%\) important pixels, retrain from scratch.
Measure accuracy drop as function of \(t\).
Better method \(\implies\) steeper drop (removed truly informative pixels).
Generate \(x \in \mathbb{R}^{16}\) with 4 informative features:
\[x = \frac{az}{10} + d\eta + \frac{\epsilon}{10}, \quad y = \mathbb{1}(z > 0)\]
where \(z, \eta, \epsilon \sim \mathcal{N}(0,1)\) and \(a \in \mathbb{R}^{16}\) is nonzero only in 4 positions.
Fit least-squares model.
Results
Lower curve \(\implies\) more informative feature selection.
Key finding: Guided Backprop, IG, Gradient ≈ random removal on ImageNet.
These methods failed to identify informative features.
Ensemble variants of IG do better.
For \(t \in \{0.1, 0.3, 0.5, 0.7, 0.9\}\): 1. Generate saliency \(\varphi\) for dataset \(\mathcal{D}\) 2. Mask top \(t\) pixels with mean: \(\tilde{x}_i = x_i \mathbb{1}(\varphi_i < q_t) + \mu \mathbb{1}(\varphi_i \geq q_t)\) 3. Initialize \(f_{\text{new}}\), train on \(\{(\tilde{x}_i, y_i)\}\) 4. Record \(\text{Acc}(f_{\text{new}}, \tilde{\mathcal{D}}_{\text{test}})\)
Plot accuracy vs. \(t\).